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File: USPT 



Oct 16, 2001 



US-PAT-NO: 6304870 

DOCUMENT- IDENTIFIER: US 6304870 Bl 

TITLE: Method and apparatus of automatically generating a procedure for extracting 
information from textual information sources 

DATE-ISSUED: October 16, 2001 
INVENTOR- INFORMATION : 

NAME " CITY STATE ZIP CODE COUNTRY 

Kushmerick; Nicholas Seattle WA 

Weld; Daniel S. Seattle WA 

Doorenbos; Robert B. Seattle WA 



ASSIGNEE- INFORMATION : 
NAME 



The Board of Regents of the University of Seattle WA 
Washington, Office of Technology Transfer 



CITY STATE ZIP CODE COUNTRY TYPE CODE 

02 
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Pattie Maes et al . , Learning interface agents, Proceeding of AAAI-93, 1993. 

H. Lieberman, Letizia: An agent that assists web browsing, Proc . 15.sup.th Int. 

Joint Conf. on. A. I.., 924-929; 1995. 

Robert Armstrong et al . , Webwatcher: A learning apprentice for the world wide web, 
Working Notes of the AAAI Spring Symposium: Information Gathering from 
Heterogeneous, Distributed Enviroments, 6-12, 1995. 

Lisa Dent et al . , A personal learning apprentice, Proc. lO.sup.th Nat. Conf. on 
A.I., 96-103, 1992. 

Pattie Maes, Agents that reduce work and information overload, Comm. of the ACM, 
37 (7) : 31-40, 1994. 

'Tom Mitchell et al., Experience with a learning personal assistant, Comm of the 
ACM. , 37 (7) : 81-91, 1994 . 

O'. Etzioni et al . , A sof tbot-based interface to the Internet, Comm. of the ACM, 
37 (7) : 72-75, 1994. 

ART-UNIT: 212 

PRIMARY -EXAMINER: Alam; Hosain T. 
ASSISTANT-EXAMINER: Corrielus; Jean M. 



ABSTRACT : 

A procedure is disclosed for automatically constructing wrappers for performing 
information-extraction from sites such as Internet resources that display relevant 
information, interspersed with extraneous text fragments, such as HTML formatting 
commands or advertisements. The procedure has three basic steps. First, a set of 
example pages are collected with a subroutine named GatherExamples . Gather Examples 
is provided with information describing how to pose example queries to the site 
whose wrapper is to be learned. Second, these example pages are labeled by a 
subroutine named LabelExamples - -i . e . , the information to be extracted from each 
example is identified for use in the third step. The LabelExamples subroutine uses a 
general framework for labeling pages using site-specific heuristics called 
recognizers, as well as allowing users to correct and modify the recognized 
instances. Finally, the labeled example pages are passed to a BuildWrapper 
subroutine, which constructs a wrapper. 

24 Claims, 2 Drawing figures 
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Jul 4, 2000 
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DOCUMENT- IDENTIFIER: US 6085190 A 

TITLE: Apparatus and method for retrieval of information from various structured 
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NAME 
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PRIMARY -EXAMINER: Home Jean R. 



ABSTRACT : 

An information retrieval apparatus having a meta-data specifying section for 
specifying at least one attribute of information described in various forms of 
description, and a pattern learning section for creating rules for extracting 
information including the specified attribute based on the specified attribute. 

13 Claims, 11 Drawing figures 
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L9: Entry 18 of 2 0 



File: USPT 



Aug 10, 1999 



US-PAT-NO: 5937407 

DOCUMENT- IDENTIFIER: US 5 93 74 07 A 

TITLE: Information retrieval apparatus using a hierarchical structure of schema 
DATE-ISSUED: August 10, 1999 



INVENTOR- INFORMATION : 
NAME 

Sakata; Tsuyoshi 



CITY 

Yokohama 



STATE 



ZIP CODE 



COUNTRY 
JP 



ASS IGNEE - INFORMATION : 

NAME CITY STATE ZIP CODE COUNTRY TYPE CODE 

Digital Vision Laboratories Corporation JP 03 

APPL-NO: 08/ 989206 [PALM] 
DATE FILED: December 11, 1997 



FORE I GN-AP PL-PRIORITY- DATA : 
COUNTRY AP PL - NO 

JP 8-332279 



APPL-DATE 
December 12, 1996 
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OTHER PUBLICATIONS 

Microsoft Press Computer Dictionary, Second Edition, 1994, pp. 156-157, 344-345, 
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ART-UNIT: 271 

PRIMARY-EXAMINER: Black; Thomas G. 
ASSISTANT -EXAMINER: Alam; Hosain T. 
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ABSTRACT : 

An information retrieving apparatus comprises a retrieve instruction executing means 
for executing a retrieve instruction based on a retrieval formula described based on 
an arbitrary schema, a schema conversion means for converting the retrieval formula 
into another retrieval formula according to another schema based on pregiven rules, 
and a schema management means for managing the rules for converting the retrieval 
formula into the other retrieval formula, wherein the retrieve instruction executing 
means retrieves desired information based on the other retrieval formula. In this 
case, preferred embodiments are as follows. 

8 Claims, 6 Drawing figures 
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L22 : Entry 41 of 50 



File: USPT 



May 27, 1997 



US-PAT-NO: 5634124 

DOCUMENT- IDENTIFIER: US 5634124 A 

TITLE: Data integration by object management 

DATE-ISSUED: May 27, 1997 



INVENTOR- INFORMATION : 
NAME 

Khoyi ; Dana 
San Soucie; Marc 
Surprenant; Carolyn E. 
Stern; Laura O. 
Pham; Ly- Huong T. 



CITY 

Dracut 

Tyngsboro 

Dracut 

Woburn 

Chelmsford 



STATE 

MA 

MA 

MA 

MA 

MA 



ZIP CODE 



COUNTRY 



ASS IGNEE - INFORMATION : 
NAME 

Wang Laboratories, Inc. 



CITY 

Billerica 



STATE 
MA 



ZIP CODE COUNTRY 



TYPE CODE 
02 



APPL-NO: 08/ 450457 [PALM] 
DATE FILED: May 25, 1995 



PARENT -CASE: 

CROSS REFERENCE TO RELATED APPLICATIONS This Patent Application is a Continuation 
Patent Application of co-pending U.S. patent application Ser. No. 066,688 for 
INTEGRATION OF DATA BETWEEN TYPED DATA STRUCTURES BY MUTUAL, DIRECT INVOCATION 
BETWEEN OBJECT MANAGERS CORRESPONDING TO DATA types by Dana Khoyi et al . , filed May 
20, 1993 now U.S. Pat. No. 5,421,012 and since allowed, which was a Continuation 
Patent Application of co-pending U.S. patent application Ser. No. 07/938,928 for 
INTEGRATION OF DATA BETWEEN TYPED DATA STRUCTURES BY MUTUAL, DIRECT INVOCATION 
BETWEEN DATA MANAGERS CORRESPONDING TO DATA TYPES by Dana Khoyi et al . , filed Aug. 
31, 1992 now U.S. Pat No. 5,226,161 and since allowed, which was a Continuation 
Patent Application of co-pending U.S. patent application Ser. No. 07/681,435 for 
DATA INTEGRATION BY OBJECT MANAGEMENT by Dana Khoyi et al . , filed Apr. 3, 1991 now 
U.S. Pat. No. 5,206,951 and since allowed, which was a Continuation Patent 
Application of co-pending U.S. patent application Ser. No. 07/088,622 for DATA 
INTEGRATION BY OBJECT MANAGEMENT by Dana Khoyi et al . , filed Aug. 21, 1987 and since 
abandoned. The present patent application is related to U.S. patent application Ser. 
No. 07/937,911 for DATA INTEGRATION BY OBJECT MANAGEMENT by Dana Khoyi et al . , filed 
Aug. 28, 1992 and U.S. patent application Ser. No. 07/936,980 for DATA INTEGRATION 
BY OBJECT MANAGEMENT by Dana Khoyi et al . , filed Aug. 28, 1993, both of which are 
Divisional Applications of U.S. patent application Ser. No. 07/088,622 for DATA 
INTEGRATION BY OBJECT MANAGEMENT by Dana Khoyi et al . , filed Aug. 21, 1987 and since 
abandoned. The present patent application is also related to U.S. patent application 
Ser. No. 07/915,775 for CUSTOMIZATION BY AUTOMATIC RESOURCE SUBSTITUTION by Marc San 
Soucie et al . , filed Jul. 16, 1992, which was a Continuation Application .of U.S. 
patent application Ser. No. 07/088,176 for CUSTOMIZATION BY AUTOMATIC RESOURCE 
SUBSTITUTION by Marc San Soutie et al., filed Aug. 28, 1987 and since abandoned. All 
of the above related patent applications are assigned to the assignee of the present 
patent application. 

INT-CL: [06] G06 F 9/40, G06 F 17/30 

US-CL-ISSUED: 395/614; 395/615, 395/683 
US -CL- CURRENT: 707 / 103R ; 709/315 

FIELD-OF-SEARCH: 395/700, 395/650, 395/600, 364/DIG.l, 364/DIG.2, 364/283.4, 
364/979.4 
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Schmidt et al . 
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May 1986 
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OTHER PUBLICATIONS 

Lipkie, et al . , "Stargraphics : An Object- -Oriented Implementation," 
Computergraphics, v. 16, No. 3, Jul. 1982, pp. 29-38. 

Schmucker, "MACAPP: An Application Framework," BYTE, Aug. 1986, pp. 189-193. 
Kimura, "A Structure Editor for Abstract Document Objects," IEEE Transactions of 
Software Engineering, vol. SE-12, No. 3, Mar. 1986, pp. 417-435. 

Ursino, "Open Architecture Design Unites Diverse Systems," Electronics, Aug. 11, 
1983, pp. 116-117. 

Garrett, "Intermedia: Issues, Strategies, and Tactics in the Design of a Hypermedia 
Document System", Institute for Research in Information and Scholarship (IRIS), 
Brown University. 

ART-UNIT: 236 

PR I MARY -EXAMINER: Kriess; Kevin A. 
ASSISTANT -EXAMINER: Richey; Michael T. 



ABSTRACT : 

An object based data processing system including an extensible set of object types 
and a corresponding set of "object managers" wherein each object manager is a 
program for operating with the data stored in a corresponding type of object. The 
object managers in general support at least a standard set of operations. Any 
program can effect performance of these standard operations on objects of any type 
by making an "invocation" request. In response to an invocation request, object 
management services (which are available to all object managers) identifies and 
invokes an object manager that is suitable for performing the requested operation on 
the specified type of data. A mechanism is provided for linking data from one object 
into another object. A object catalog includes both information about objects and 
about links between objects. Data interchange services are provided for 
communicating data between objects of different types, using a set of standard data 
interchange formats. A matchmaker facility permits two processes that are to 
cooperate in a data interchange operation identify each other and to identify data 
formats they have in common. A facility is provided for managing shared data 
"resources". Customized versions of resources can be created and co-exist with 
standard resources. A resource retrieval function determines whether a customized or 
a standard resource is to be returned in response to each request for a resource. 

3 Claims, 13 Drawing figures 
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L42: Entry 53 of 64 File: USPT May 1, 1996 

US-PAT-NO: 5515534 

DOCUMENT- IDENTIFIER: US 5515534 A 



TITLE: Method of translating free -format data records into a normalized format based 
on weighted attribute variants 

DATE- ISSUED: May 7, 1996 



INVENTOR- INFORMATION : 
NAME 

Chuah; Mooi C. 
Wong; Wing S. 



CITY 

Middletown 
Holmdel 



STATE 

NJ 

NJ 



ZIP CODE 



ZIP CODE 



ASS IGNEE - INFORMATION : 

NAME CITY STATE 

AT&T Corp. Murray Hill NJ 



APPL-NO: 07/ 953403 [PALM] 
DATE FILED: September 29, 1992 

INT-CL: [06] G06 F 17/10 

US-CL-ISSUED: 395/600; 364/DIG.l, 364/282.1 
US-CL -CURRENT: 707 / 101 

FIELD-OF-SEARCH: 395/600, 364/419 

PRIOR-ART-DISCLOSED : 

U.S. PATENT DOCUMENTS 



COUNTRY 



Search Selected 



Search ALL 



COUNTRY 



TYPE CODE 
02 





PAT -NO 


IS SUE -DATE 


PATENTEE -NAME 


US-CL 
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4974191 


November 1990 


Amirghodsi et al . 


395/275 






5227971 


July 1993 


Naka j ima et al . 


364/419, 


02 
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5265065 


November 1993 


Turtle 


395/600 
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5268839 


December 1993 


Kaji 


364/419 


03 
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5276616 


January 1994 


Kuga et al . 


364/419 


08 


□ 


5333317 


July 1994 


Dann 


395/600 
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5428777 


June 19 95 


Perliski et al . 


395/600 
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5434974 


July 1995 


Loucks et al . 
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ART-UNIT: 237 

PRIMARY -EXAMINER: Amsbury; Wayne 
ABSTRACT : 
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A facility is provided^^r normalizing the format of st(j^^ data records using a 
dictionary that is generated from a training set of data records having predefined 
formats . 
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Ll: Entry 2 of 3 



File: USPT 



Aug 1 , 2001 



US-PAT-NO: 6272495 

DOCUMENT- IDENTIFIER: US 6272495 Bl 

TITLE: Method and apparatus for processing free -format data 
DATE -ISSUED: August 7, 2 001 

INVENTOR- INFORMATION : 

NAME CITY STATE ZIP CODE COUNTRY 

Hetherington ; Greg Kareela New South Wales 2232 AU 

US -CL- CURRENT: 707/101; 707/102, 707/4, 715/531 



What is claimed is: 

1. A method of processing free-format data stored in a computing system^ 
comprising the steps of examining elements of the data to determine attributes 
of the data, by examining the content of the elements and the contextual 
relationships of elements to each other, to determine semantic and syntactic . 
information about the data, producing additional data relating to this 
information, in the form of a text object which includes pointer means enabling 
access to the elements of the free-format data, and the additional data being 
accessible by a query processing means to provide at least one of answers to 
queries relating to the semantic and syntactic information about the data and to 
access the data to manipulate the data; and arranging the text object to act as 
a layer, between the free-format data and the query processing means, for at 
least one of interpretation and manipulation of the data 

processing a plurality of free-format data records and producing a text object 
associated with each free-format data record; and 

producing a text object index including attribute type identifiers for elements 
of each data record and pointers to each data record, whereby the index may be 
queried by queries relating to semantic and syntactic-information about the data 
and the data may be accessed via the index. 

2. A method in accordance with claim 1, wherein the free-format data is stored 
as a record in a free-format field of a database. 

3. A method in accordance with claim 1, wherein the data remains stored in the 
computing system as it was originally stored, whereby it may be accessed by 
other applications . 

4. A method in accordance with claim 1, wherein the text object includes an 
attribute — type identifier which identifies an attribute type of an element of 
the data. 

5. A method in accordance with claim 1, wherein the text object includes a value 
indicating the character length of an element of the data. 

6. A method in accordance with claim 4, wherein the text object includes a value 
indicating whether an element is low level in a syntactic hierarchy or higher 
level whereby the value may be used for matching purposes when matching data 
with other data processed in accordance with the method. 
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7. A method in acc^iance with claim 1, the text ob-^f^t including a match 
weighting value for an element of the data, which can be used to determine the 
significance of the element when matching with other free format data. 

8. A method in accordance with claim 1, wherein the text object comprises a 
plurality of component nodes arranged according to the semantic structure of the 
free-format data, the component nodes being arranged in a hierarchy 
corresponding to the semantic structure of the free-format data and each 
component node including additional data relating to the corresponding element 
of the free-format data. 

9. A method in accordance with claim 1, comprising the further step of 
generating matching values for comparing an element of the free-format data with 
an element of other free-format data processed in accordance with the present 
method. 

10. A method in accordance with claim 9 where the matching value is a phonetic 
value for phonetically comparing elements of free-format data. 

11. A method in accordance with claim 1, wherein the text object includes 
implied data relating to information implied from the free-format data. 

12. A method in accordance with claim 1, wherein a plurality of free-format data 
records are processed and a text object associated with each free-format data 
record is produced. 

13. A method in accordance with claim 12, wherein the text object is stored in 
the computer system whereby it is available for queries on the associated 
free-format data record via the query processing means. 

14. A method in accordance with claim 1 wherein each entry in the text object 
index includes a representative value key, which gives a value representative of 
a feature of the element associated with the attribute--type identifier, 

15. A method in accordance with claim 1, comprising the further step of carrying 
out a domain construction process to construct a domain object from domain 
definition data files, the domain object being arranged to carry out the 
examination process by parsing the free-format data in accordance with grammar 
rules . 

16. A method in accordance with claim 15, wherein the domain definition data 
files include character definition data, regular expression definition data and 
grammar data. 

17. A method in accordance with claim 1, wherein the free-format data is postal 
address data. 

18 . A method in accordance with claim 1 wherein the query processing means can 
carry out normal database operations on the data via the additional data. 

19. A process system for processing free-format data stored in a computing 
system, the apparatus including means for examining elements of the data to 
determine attributes of the data, by examining the content of the elements and 
the contextual relationships of elements to each other, to determine semantic 
and syntactic information about the data, means for producing additional data 
relating to this information, in the form of a text object which includes 
pointer means enabling access to the elements of the free-format data, and a 
query processing means which is arranged to access the additional data to 
provide at least one of answers to queries relating to the semantic and 
syntactic information about the data and access the data to manipulate the data; 
and arranging the text object to act as a layer, between the free-format data 
and the query processing means, for at least one of interpretation and 
manipulation of the data; 

arranging the system is arranged to process a plurality of free-format data 
records and produce a text object associated with each free-format data record; 
and 
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arranging the means for producing additional data to produce a text object index 
including attribute-type identifiers for elements of each data record and 
pointers to each data record and arranging the query processing means to access 
the text object index to provide answers to queries relating to the semantic and 
the syntactic information about the data and to access the data to manipulate 
the data. 

20. A processing system in accordance with claim 19, wherein the free-format 
data is stored as a record in a free-format field of a database. 

21. A processing system in accordance with claim 19, wherein the examining means 
does not affect the storage of the data. 

22. A processing system in accordance with claim 20, wherein the text object 
includes an attribute — type identifier which identifies an attribute type of an 
element of the data. 

23. A processing system in accordance with claim 20, wherein the text object 
includes a value indicating the character length of an element of the data. 

24. A processing system in accordance with claim 22, wherein the text object 
includes a value, indicating whether an attribute--type of an element is low 
level in a syntactic hierarchy or high level whereby the value may be used for 
matching purposes when matching with other free-format data processed in 
accordance with this system. 

25. A processing system in accordance with claim 20, wherein the text object 
includes a match weighting value for an element of the data, which can be used 
to determine the significance of the element when matching with other 
free-format data. 

26. A processing system in accordance with claim 20, wherein the text object 
comprises a plurality of component nodes arranged according to the semantic 
structure of the free-format data, the component nodes being arranged in a 
hierarchy corresponding to the semantic structure of the free-format data, and 
each component node including additional data relating to the corresponding 
element of free-format data. 

27. A processing system in accordance with any one of claims 19 to 26, the text 
object means for generating matching values for comparing an element of the 
free-format data with an element of other free-format data processed by the 
processing system. 

28. A processing system in accordance with claim 27, wherein the matching value 
is a phonetic value for phonetically comparing elements of free-format data. 

29. A processing system in accordance with claim 20, wherein the text object 
includes implied data relating to information implied from the free-format data. 

30. A processing system in accordance with claim 20, wherein the system is 
arranged to process a plurality of free-format data records and produce a text 
object associated with each free-format data record. 

31. A processing system in accordance with claim 30, wherein the text object 
index includes representative value keys for entries, which give a value 
representative of a feature of the elements associated with the attribute-type 
identifier for the entry for facilitating matching with other free-format data 
process in accordance with the system. 

32. A processing system in accordance with claim 20, further comprising a domain 
object, the domain object being arranged to carry out the examination process by 
parsing the free-format data in accordance with grammar rules. 

33. A processing system in accordance with claim 32', wherein the domain object 
is produced by a domain construction process from domain definition data files. 
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34. A processing sf^^em in accordance with claim SsjJ^urther comprising a domain 
constructor for carrying out the domain construction process. 

35. A processing system in accordance with claim 33, wherein the domain 
definition data files include character definition data, regular expression 
definition data and grammar data. 

36. A processing system in accordance with claim 20, wherein the free-format 
data is postal address data. 

37. A processing system in accordance with claim 20, wherein the query 
processing means is arranged to carry out normal database operations on the data 
via the additional data. 

38. A method of enabling access to free-format data stored in a computer system, 
including a plurality of free-format data records, comprising the steps of 
storing additional data relating to semantic and syntactic information about the 
data of each data record, the additional data being in the form of a text object 
index which includes attribute — type identifiers for elements of each data 
record and pointers to each data record, the text object index being accessible 
by a query processing means to provide at least one of answers to queries 
relating to the semantic and syntactic information about the data and access the 
data to manipulate the data; and arranging the text object index to act as a 
layer, between the free-format data and the query processing means, for at least 
one of interpretation and manipulation of the data. 

39. A processing system for enabling access to free-format data stored in a 
computing system, including a plurality of free-format data records, the 
processing system comprising the additional data relating to semantic and 
syntactic information about the free-format data for each data record, the 
additional data being in the form of a text object index which includes 
attribute — type identifiers for elements of each data record and pointers to 
each data record, and a query processing means arranged to access the additional 
data to provide at least one of answers to queries relating to the semantic and 
syntactic information about the data and access the data to manipulate the data; 
and arranging the text object index to act as a layer, between the free-format 
data and the query processing means, for at least one of interpretation and 
manipulation of the data. 
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